About

Analytical objectives:

Data sources and software:

Elections data

Zip files were downloaded from the following sites. The data files were already in a neat csv format.

Data processing included the following:

  • aggregating vote totals to county-level
  • Creating democrat/republican vote percentages

ACS data

Data was downloaded from IPUMS using their interactive data puller. The time period for the data is 2005-2016 as those are the years that provide county FIPS codes. The following variables were used:

Data was aggregated up to the county-level using weighted statistics according to the person weight variable.

ACS data from IPUMS USA, University of Minnesota, www.ipums.org

News data

Data was downloaded from Factiva in 100 article chunks. The search parameters were as follows:

  • text search: election AND (trump OR clinton)
  • date range: 06-01-2016 to 11-08-2016
  • Source: The New York Times OR The Wall Street Journal

3,013 results were found, and the raw data was downloaded in rtf format and converted to raw text using the striprtf package in Python. This data is then cleaned, tokenized, stemmed, and stop words removed using nltk. Sentiment is calculated using nltk VADER sentiment. TF-IDF analysis is performed using the nltk package. Tensorflow is used to perform bi-directional LSTM neural network analysis to predict news publication based on cleaned tokenized text.

Software versions:

R packages
  • tidyverse==1.3.0
  • sf=0.9.6
  • reticulate==1.16
  • rmarkdown==2.4.6
  • flexdashboard==0.5.2
  • ggplot2==3.3.2
  • pacman==0.5.1
Python packages
  • python==3.8.5
  • striprtf==0.0.12
  • pandas==1.0.5
  • numpy==1.18.5
  • dateutil==2.8.1
  • seaborn==0.11.0
  • matplotlib==3.2.2
  • nltk==3.5
  • plotly==4.10.0
  • re==2.2.1
  • sklearn==0.23.1
  • tensorflow==2.3.1

A polarized nation

Columns

Poverty rates

Racial and income differences

Column {data-width=600}

Hispanic population

Veteran distribution

Medicare vote

News Analysis of the 2016 Presidential Election

Column

Negative Sentiment Timeline

Positive Sentiment Timeline

Column

TFIDF Analysis

Bigram Analysis

2016 Split-Ticket Voting

Column

Presidential Election

U.S. Senate Election

U.S. House Election

State-level Election

Local-level Election

Column

President vs U.S. House

President vs U.S. Senate

President vs State

President vs Local

County Factors in 2016 Outcome

Column

Introduction

Association rules are a method for showing IF-THEN correlations. This network graphs shows ACS demographic variables and rules that are correlated with U.S. counties picking the Democratic or Republican candidate in the 2016 election. Shading of the rules indicates the degree of dependence among the variables for that rule. For background on the metrics, read here

Column

Interactive decision rules output